Overview

Dataset statistics

Number of variables15
Number of observations5163
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory499.7 KiB
Average record size in memory99.1 B

Variable types

NUM12
CAT3

Reproduction

Analysis started2020-07-17 21:04:31.148944
Analysis finished2020-07-17 21:05:15.128606
Duration43.98 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

citric acid has 128 (2.5%) zeros Zeros

Variables

fixed acidity
Real number (ℝ≥0)

Distinct count87
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.1592194460584935
Minimum3.8
Maximum12.4
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum3.8
5-th percentile5.6
Q16.4
median6.9
Q37.7
95-th percentile9.6
Maximum12.4
Range8.6
Interquartile range (IQR)1.3

Descriptive statistics

Standard deviation1.207939892
Coefficient of variation (CV)0.1687250825
Kurtosis2.427968824
Mean7.159219446
Median Absolute Deviation (MAD)0.6
Skewness1.264268435
Sum36963.05
Variance1.459118782
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6.82745.3%
 
6.62685.2%
 
6.42434.7%
 
72224.3%
 
6.92194.2%
 
6.72074.0%
 
7.22003.9%
 
7.11973.8%
 
6.51963.8%
 
6.21743.4%
 
Other values (77)296357.4%
 
ValueCountFrequency (%) 
3.81< 0.1%
 
3.91< 0.1%
 
4.22< 0.1%
 
4.430.1%
 
4.51< 0.1%
 
ValueCountFrequency (%) 
12.440.1%
 
12.330.1%
 
12.22< 0.1%
 
12.11< 0.1%
 
1270.1%
 

volatile acidity
Real number (ℝ≥0)

Distinct count172
Unique (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3382916908773969
Minimum0.08
Maximum1.01
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum0.08
5-th percentile0.16
Q10.23
median0.29
Q30.4
95-th percentile0.67
Maximum1.01
Range0.93
Interquartile range (IQR)0.17

Descriptive statistics

Standard deviation0.1600232649
Coefficient of variation (CV)0.4730333887
Kurtosis1.545287171
Mean0.3382916909
Median Absolute Deviation (MAD)0.08
Skewness1.318794115
Sum1746.6
Variance0.02560744531
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.282294.4%
 
0.242174.2%
 
0.262174.2%
 
0.251843.6%
 
0.271823.5%
 
0.221803.5%
 
0.231773.4%
 
0.21753.4%
 
0.31663.2%
 
0.321623.1%
 
Other values (162)327463.4%
 
ValueCountFrequency (%) 
0.082< 0.1%
 
0.0851< 0.1%
 
0.091< 0.1%
 
0.160.1%
 
0.10540.1%
 
ValueCountFrequency (%) 
1.011< 0.1%
 
1.0052< 0.1%
 
12< 0.1%
 
0.9830.1%
 
0.9751< 0.1%
 

citric acid
Real number (ℝ≥0)

ZEROS

Distinct count83
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3148944412163471
Minimum0.0
Maximum0.88
Zeros128
Zeros (%)2.5%
Memory size40.3 KiB

Quantile statistics

Minimum0
5-th percentile0.04
Q10.24
median0.31
Q30.39
95-th percentile0.54
Maximum0.88
Range0.88
Interquartile range (IQR)0.15

Descriptive statistics

Standard deviation0.1404393515
Coefficient of variation (CV)0.4459886651
Kurtosis0.8018415846
Mean0.3148944412
Median Absolute Deviation (MAD)0.07
Skewness0.1695068094
Sum1625.8
Variance0.01972321144
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.32534.9%
 
0.322394.6%
 
0.282314.5%
 
0.492174.2%
 
0.342023.9%
 
0.261993.9%
 
0.291953.8%
 
0.311873.6%
 
0.241803.5%
 
0.271793.5%
 
Other values (73)308159.7%
 
ValueCountFrequency (%) 
01282.5%
 
0.01310.6%
 
0.02420.8%
 
0.03260.5%
 
0.04330.6%
 
ValueCountFrequency (%) 
0.881< 0.1%
 
0.861< 0.1%
 
0.822< 0.1%
 
0.811< 0.1%
 
0.81< 0.1%
 

residual sugar
Real number (ℝ≥0)

Distinct count311
Unique (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.05370908386597
Minimum0.6
Maximum22.6
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum0.6
5-th percentile1.1
Q11.8
median2.8
Q37.5
95-th percentile14.4
Maximum22.6
Range22
Interquartile range (IQR)5.7

Descriptive statistics

Standard deviation4.398801096
Coefficient of variation (CV)0.8704104298
Kurtosis0.7054124615
Mean5.053709084
Median Absolute Deviation (MAD)1.5
Skewness1.253350607
Sum26092.3
Variance19.34945108
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.61963.8%
 
21913.7%
 
1.41903.7%
 
1.81843.6%
 
1.21673.2%
 
1.51492.9%
 
2.21492.9%
 
1.91432.8%
 
2.11422.8%
 
1.71382.7%
 
Other values (301)351468.1%
 
ValueCountFrequency (%) 
0.61< 0.1%
 
0.770.1%
 
0.8250.5%
 
0.9360.7%
 
0.9530.1%
 
ValueCountFrequency (%) 
22.61< 0.1%
 
221< 0.1%
 
20.82< 0.1%
 
20.71< 0.1%
 
20.41< 0.1%
 

chlorides
Real number (ℝ≥0)

Distinct count166
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05382064691071082
Minimum0.009
Maximum0.204
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum0.009
5-th percentile0.028
Q10.038
median0.047
Q30.064
95-th percentile0.097
Maximum0.204
Range0.195
Interquartile range (IQR)0.026

Descriptive statistics

Standard deviation0.0251065838
Coefficient of variation (CV)0.4664861022
Kurtosis5.854709234
Mean0.05382064691
Median Absolute Deviation (MAD)0.011
Skewness1.981545358
Sum277.876
Variance0.00063034055
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.0361653.2%
 
0.0441603.1%
 
0.0421573.0%
 
0.0461563.0%
 
0.041512.9%
 
0.0471452.8%
 
0.0481432.8%
 
0.0381412.7%
 
0.051402.7%
 
0.0341362.6%
 
Other values (156)366971.1%
 
ValueCountFrequency (%) 
0.0091< 0.1%
 
0.0121< 0.1%
 
0.0131< 0.1%
 
0.01440.1%
 
0.01530.1%
 
ValueCountFrequency (%) 
0.2041< 0.1%
 
0.2011< 0.1%
 
0.22< 0.1%
 
0.1972< 0.1%
 
0.1942< 0.1%
 

free sulfur dioxide
Real number (ℝ≥0)

Distinct count122
Unique (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.044450900639163
Minimum1.0
Maximum101.0
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum1
5-th percentile6
Q117
median28
Q341
95-th percentile61
Maximum101
Range100
Interquartile range (IQR)24

Descriptive statistics

Standard deviation16.85639455
Coefficient of variation (CV)0.5610485147
Kurtosis-0.03643826613
Mean30.0444509
Median Absolute Deviation (MAD)12
Skewness0.5806901972
Sum155119.5
Variance284.1380373
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
291442.8%
 
61362.6%
 
261322.6%
 
151282.5%
 
241262.4%
 
311232.4%
 
341212.3%
 
171202.3%
 
231182.3%
 
281132.2%
 
Other values (112)390275.6%
 
ValueCountFrequency (%) 
12< 0.1%
 
22< 0.1%
 
3480.9%
 
4430.8%
 
5971.9%
 
ValueCountFrequency (%) 
1011< 0.1%
 
981< 0.1%
 
971< 0.1%
 
962< 0.1%
 
951< 0.1%
 

total sulfur dioxide
Real number (ℝ≥0)

Distinct count269
Unique (%)5.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean114.81183420491962
Minimum6.0
Maximum303.0
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum6
5-th percentile19
Q177
median117
Q3154
95-th percentile204
Maximum303
Range297
Interquartile range (IQR)77

Descriptive statistics

Standard deviation55.70020358
Coefficient of variation (CV)0.4851433998
Kurtosis-0.5762256675
Mean114.8118342
Median Absolute Deviation (MAD)38
Skewness-0.02836019885
Sum592773.5
Variance3102.512679
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
111541.0%
 
113490.9%
 
98480.9%
 
114480.9%
 
122470.9%
 
128460.9%
 
117440.9%
 
101430.8%
 
126430.8%
 
124430.8%
 
Other values (259)469891.0%
 
ValueCountFrequency (%) 
62< 0.1%
 
740.1%
 
8110.2%
 
9130.3%
 
10230.4%
 
ValueCountFrequency (%) 
3031< 0.1%
 
2891< 0.1%
 
2821< 0.1%
 
2781< 0.1%
 
2721< 0.1%
 

density
Real number (ℝ≥0)

Distinct count17
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9944433468913424
Minimum0.987
Maximum1.003
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum0.987
5-th percentile0.99
Q10.992
median0.995
Q30.997
95-th percentile0.999
Maximum1.003
Range0.016
Interquartile range (IQR)0.005

Descriptive statistics

Standard deviation0.002866464266
Coefficient of variation (CV)0.002882481214
Kurtosis-0.7814614351
Mean0.9944433469
Median Absolute Deviation (MAD)0.002
Skewness-0.01553094364
Sum5134.311
Variance8.21661739e-06
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.99664912.6%
 
0.99557011.0%
 
0.99755410.7%
 
0.99255010.7%
 
0.99454110.5%
 
0.99353610.4%
 
0.9984659.0%
 
0.9914448.6%
 
0.993276.3%
 
0.9992033.9%
 
Other values (7)3246.3%
 
ValueCountFrequency (%) 
0.98780.2%
 
0.988160.3%
 
0.9891392.7%
 
0.993276.3%
 
0.9914448.6%
 
ValueCountFrequency (%) 
1.00330.1%
 
1.00240.1%
 
1.001240.5%
 
11302.5%
 
0.9992033.9%
 

pH
Real number (ℝ≥0)

Distinct count106
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2258706178578342
Minimum2.72
Maximum3.85
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum2.72
5-th percentile2.98
Q13.12
median3.22
Q33.33
95-th percentile3.5
Maximum3.85
Range1.13
Interquartile range (IQR)0.21

Descriptive statistics

Standard deviation0.1580131148
Coefficient of variation (CV)0.04898309123
Kurtosis0.204007762
Mean3.225870618
Median Absolute Deviation (MAD)0.1
Skewness0.3298230235
Sum16655.17
Variance0.02496814443
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.221522.9%
 
3.161522.9%
 
3.141442.8%
 
3.21402.7%
 
3.241382.7%
 
3.151382.7%
 
3.191342.6%
 
3.181342.6%
 
3.121262.4%
 
3.171252.4%
 
Other values (96)378073.2%
 
ValueCountFrequency (%) 
2.721< 0.1%
 
2.741< 0.1%
 
2.771< 0.1%
 
2.792< 0.1%
 
2.830.1%
 
ValueCountFrequency (%) 
3.851< 0.1%
 
3.821< 0.1%
 
3.811< 0.1%
 
3.82< 0.1%
 
3.791< 0.1%
 

sulphates
Real number (ℝ≥0)

Distinct count90
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5265407708696495
Minimum0.22
Maximum1.13
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum0.22
5-th percentile0.34
Q10.43
median0.51
Q30.6
95-th percentile0.78
Maximum1.13
Range0.91
Interquartile range (IQR)0.17

Descriptive statistics

Standard deviation0.1340264147
Coefficient of variation (CV)0.2545413806
Kurtosis1.131917692
Mean0.5265407709
Median Absolute Deviation (MAD)0.08
Skewness0.895018532
Sum2718.53
Variance0.01796307985
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.52114.1%
 
0.461963.8%
 
0.541913.7%
 
0.441823.5%
 
0.481653.2%
 
0.521623.1%
 
0.381623.1%
 
0.451573.0%
 
0.471563.0%
 
0.491543.0%
 
Other values (80)342766.4%
 
ValueCountFrequency (%) 
0.221< 0.1%
 
0.231< 0.1%
 
0.2540.1%
 
0.2630.1%
 
0.27100.2%
 
ValueCountFrequency (%) 
1.131< 0.1%
 
1.121< 0.1%
 
1.111< 0.1%
 
1.11< 0.1%
 
1.082< 0.1%
 

alcohol
Real number (ℝ≥0)

Distinct count110
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.558407902382335
Minimum8.0
Maximum14.2
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum8
5-th percentile9
Q19.5
median10.4
Q311.4
95-th percentile12.7
Maximum14.2
Range6.2
Interquartile range (IQR)1.9

Descriptive statistics

Standard deviation1.186362275
Coefficient of variation (CV)0.1123618529
Kurtosis-0.5685316314
Mean10.5584079
Median Absolute Deviation (MAD)0.9
Skewness0.5319390182
Sum54513.06
Variance1.407455448
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
9.52745.3%
 
9.42494.8%
 
102003.9%
 
9.21973.8%
 
10.51853.6%
 
111713.3%
 
9.81663.2%
 
10.41643.2%
 
9.31593.1%
 
10.21533.0%
 
Other values (100)324562.9%
 
ValueCountFrequency (%) 
82< 0.1%
 
8.430.1%
 
8.5100.2%
 
8.6150.3%
 
8.7470.9%
 
ValueCountFrequency (%) 
14.21< 0.1%
 
14.051< 0.1%
 
14110.2%
 
13.930.1%
 
13.82< 0.1%
 

quality
Real number (ℝ≥0)

Distinct count7
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.809413131900058
Minimum3
Maximum9
Zeros0
Zeros (%)0.0%
Memory size40.3 KiB

Quantile statistics

Minimum3
5-th percentile5
Q15
median6
Q36
95-th percentile7
Maximum9
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8723731835
Coefficient of variation (CV)0.150165458
Kurtosis0.2200593955
Mean5.809413132
Median Absolute Deviation (MAD)1
Skewness0.1914561169
Sum29994
Variance0.7610349713
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6226443.9%
 
5169032.7%
 
784516.4%
 
41933.7%
 
81462.8%
 
3200.4%
 
950.1%
 
ValueCountFrequency (%) 
3200.4%
 
41933.7%
 
5169032.7%
 
6226443.9%
 
784516.4%
 
ValueCountFrequency (%) 
950.1%
 
81462.8%
 
784516.4%
 
6226443.9%
 
5169032.7%
 

q_grade
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
medium
3954
high
996
low
 
213
ValueCountFrequency (%) 
medium395476.6%
 
high99619.3%
 
low2134.1%
 

Length

Max length6
Median length6
Mean length5.490412551
Min length3

wine_body
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
light
4721
medium
 
395
full
 
47
ValueCountFrequency (%) 
light472191.4%
 
medium3957.7%
 
full470.9%
 

Length

Max length6
Median length5
Mean length5.067402673
Min length4

dry_sweet
Categorical

Distinct count5
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
dry
4264
off-dry
 
827
bone dry
 
72
very sweet
 
0
sweet
 
0
ValueCountFrequency (%) 
dry426482.6%
 
off-dry82716.0%
 
bone dry721.4%
 
very sweet00.0%
 
sweet00.0%
 

Length

Max length8
Median length3
Mean length3.710439667
Min length3

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholqualityq_gradewine_bodydry_sweet
07.40.700.001.90.07611.034.00.9983.510.569.45mediumlightdry
17.80.880.002.60.09825.067.00.9973.200.689.85mediumlightdry
27.80.760.042.30.09215.054.00.9973.260.659.85mediumlightdry
311.20.280.561.90.07517.060.00.9983.160.589.86mediumlightdry
47.40.660.001.80.07513.040.00.9983.510.569.45mediumlightdry
57.90.600.061.60.06915.059.00.9963.300.469.45mediumlightdry
67.30.650.001.20.06515.021.00.9953.390.4710.07highlightdry
77.80.580.022.00.0739.018.00.9973.360.579.57highlightdry
87.50.500.366.10.07117.0102.00.9983.350.8010.55mediumlightdry
96.70.580.081.80.09715.065.00.9963.280.549.25mediumlightdry

Last rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholqualityq_gradewine_bodydry_sweet
51536.80.2200.361.200.05238.0127.00.9933.040.549.25mediumlightdry
51544.90.2350.2711.750.03034.0118.00.9953.070.509.46mediumlightoff-dry
51556.10.3400.292.200.03625.0100.00.9893.060.4411.86mediumlightdry
51565.70.2100.320.900.03838.0121.00.9913.240.4610.66mediumlightbone dry
51576.50.2300.381.300.03229.0112.00.9933.290.549.75mediumlightdry
51586.20.2100.291.600.03924.092.00.9913.270.5011.26mediumlightdry
51596.60.3200.368.000.04757.0168.00.9953.150.469.65mediumlightdry
51606.50.2400.191.200.04130.0111.00.9932.990.469.46mediumlightdry
51615.50.2900.301.100.02220.0110.00.9893.340.3812.87highmediumdry
51626.00.2100.380.800.02022.098.00.9893.260.3211.86mediumlightbone dry